Hidden Markov model speech recognition arrangement

ABSTRACT

A speech recognizer includes a plurality of stored constrained hidden Markov model reference templates and a set of stored signals representative of prescribed acoustic features of the said plurality of reference patterns. The Markov model template includes a set of N state signals. The number of states is preselected to be independent of the reference pattern acoustic features and preferably substantially smaller than the number of acoustic feature frames of the reference patterns. An input utterance is analyzed to form a sequence of said prescribed feature signals representative of the utterance. The utterance representative prescribed feature signal sequence is combined with the N state constrained hidden Markov model template signals to form a signal representative of the probability of the utterance being each reference pattern. The input speech pattern is identified as one of the reference patterns responsive to the probability representative signals.

Our invention relates to pattern recognition and, more particularly, toarrangements for automatically identifying speech patterns.

BACKGROUND OF THE INVENTION

In communication, data processing and similar systems, it is oftendesirable to use audio interface arrangements. Speech input andsynthesized voice output may be utilized for inquiries, commands and theexchange of data and other information. Speech type interfacing permitscommunication with data processor type equipment from remote locationswithout requiring manually operated terminals and allows concurrentperformance of other functions by the user. The complexity of speechpatterns and variations therein among speakers, however, makes itdifficult to obtain accurate recognition. While acceptable results havebeen obtained in specialized applications restricted to particularindividuals and constrained vocabularies, the inaccuracy ofspeaker-independent recognition has limited its utilization.

In general, speech recognition arrangements are adapted to transform anunknown speech pattern into a sequence of prescribed acoustic featuresignals. These feature signals are then compared to previously storedsets of acoustic feature signals representative of identified referencepatterns. As a result of the comparison, the unknown speech pattern isidentified as the closest matching reference pattern in accordance withpredetermined recognition criteria. The accuracy of such recognitionsystems is highly dependent on the selected features and the recognitioncriteria. The comparison between the input speech pattern featuresequence and a reference sequence may be direct. It is well known,however, that speech rate and articulation are highly variable.

Some prior art recognition schemes employ dynamic programming todetermine an optimum alignment between patterns in the comparisonprocess. In this way, the effects of differences in speech rate andarticulation are mitigated. The signal processing arrangements fordynamic time warping and comparison are complex and time consuming sincethe time needed for recognition is a function of the size of thereference vocabulary and the number of reference feature templates foreach vocabulary word. As a result, speaker-independent recognition forvocabularies of the order of 50 words is difficult to achieve in realtime.

Another approach to speech recognition is based on probabilistic Markovmodels that utilize sets of states and state transitions based onstatistical estimates. Speaker-dependent recognition arrangements havebeen devised in which spectral feature sequences are generated andevaluated in a series of hierarchical Markov models of features, wordsand language. The feature sequences are analyzed in Markov models ofphonemic elements. The models are concatenated into larger acousticelements, e.g., words. The results are then applied to a hierarchy ofMarkov models, e.g., syntactic contextual, to obtain a speech patternidentification. The use of concatenated phonemic element models and thecomplexity involved in unrestricted hierarchical Markov model systems,however, requires substantial training of the system by the identifiedspeakers to obtain a sufficient number of model tokens to render theMarkov models valid. It is an object of the invention to provideimproved automatic speech recognition based on probabilistic modelingthat is not speaker-dependent and is operable at higher speed.

BRIEF SUMMARY OF THE INVENTION

The foregoing object is achieved by storing a set of .[.prescribed.].acoustic features of reference speech patterns and selecting a sequenceof the reference pattern .[.prescribed.]. acoustic features to representan input utterance. Templates are stored for each reference speechpattern used in recognition. Each template includes signalsrepresentative of a constrained hidden Markov model having a preselectednumber of states which is independent of and preferably much smallerthan the number of phonemic elements in the reference speech patterns.The sequence of .[.prescribed.]. acoustic features representative of theutterance is combined with the Markov model signals of each referencetemplate to generate signals representative of the similarity of theutterance to the reference speech patterns. Advantageously, the numberof states may be selected to be substantially smaller than the number ofreference pattern .[.prescribed.]. acoustic feature signals in theacoustic feature signal sequence for the shortest reference pattern. Asa result of the small number of states, the recognition processing withhidden Markov model template signals is faster and has substantiallylower storage requirements without reducing recognition accuracy.

The invention is directed to a speech recognition arrangement thatincludes storing a set of signals each representative of a.[.prescribed.]. acoustic feature of said plurality of referencepatterns and storing a plurality of templates each representative of anidentified spoken reference pattern. The template for each spokenreference word comprises signals representative of a first state, a laststate and a preselected number of intermediate states between said firstand last states of a constrained hidden Markov model of said spokenreference pattern. The number of Markov model states is independent ofthe number of acoustic feature elements of the identified spokenreference patterns. The template further includes a plurality of firsttype signals each representative of the likelihood of a .[.prescribed.].acoustic feature being in a predetermined one of said states and aplurality of second type signals each representative of the likelihoodof a transition from one of said states to another of said states ofsaid template. Responsive to an unknown utterance, a sequence of thestored .[.prescribed.]. acoustic feature signals representative of theutterance is formed. The sequence of .[.prescribed.]. feature signalsrepresentative of the utterance and the constrained hidden Markov modelsignals of the reference word template are combined to produce a thirdtype signal representative of the likelihood of the unknown utterancebeing the spoken reference pattern. The third type signals are comparedto identify the utterance as the reference pattern. .Iadd.In a specificembodiment of the invention, the acoustic feature signals representativeof reference speech patterns are prescribed to be vector-quantizedrepresentations of the speech patterns. .Iaddend.

DESCRIPTION OF THE DRAWINGS

FIG. 1 shows a schematic diagram of a constrained hidden word Markovmodel such as used in the invention;

FIG. 2 depicts a general flowchart illustrating the invention;

FIG. 3 depicts a block diagram of a speech recognizer circuitillustrative of the invention;

FIGS. 4, 5 and 6 are more detailed flowcharts illustrating portions ofthe operation of the speech recognizer circuit of FIG. 3; and

FIG. 7 shows a trellis diagram that illustrates the operation of thecircuit of FIG. 3.

GENERAL DESCRIPTION

As is well known in the art, a hidden Markov model may be used toevaluate a sequence of observations O₁, O₂, . . . , O_(T) where eachobservation is a discrete symbol of a finite number of symbols. Thesequence of observations may be modeled as a probabilistic function ofan underlying Markov chain having state transitions that are notdirectly observable. FIG. 1 is illustrative of such a model.

In FIG. 1, there are N, e.g., 5 states and M, e.g., 64 output symbols.The transitions between states is represented by a transition matrixA=[a_(ij) ]. Each a_(ij) term is the probability of making a transitionto state j given that the model is in state i. The output symbolprobability of the model is represented by a matrix B=[b_(j) (O_(t))],where b_(j) (O_(t)) is the probability of outputting symbol O_(t), giventhat the model is in state j. The hidden Markov model may be used toderive a set of reference pattern models, one for each pattern in thevocabulary set and to classify a sequence of observations as one of thereference patterns based on the probability of generating the unknownobservations from each reference pattern model.

In speech recognition, the input speech pattern is analyzed to generatea sequence of acoustic features. Each feature may be a linear predictioncoefficient vector or other acoustic features well known in the art. Thegenerated feature vectors are compared to a finite set of previouslystored LPC feature signals and transformed into a sequence of vectorquantized LPC signals representative of the input speech pattern. Eachof the quantized feature signals is one of a finite set of M symbolsthat may be used in the hidden Markov model. In the recognition mode,the quantized LPC vector feature sequence for an utterance, e.g., a wordor phrase, forms the observation sequence O and the probability of Ohaving been generated by a reference pattern model K, e.g. a word orphrase of a vocabulary set, is formed in accordance with

    P(O|M)=P.sub.i.sbsb.1 b.sub.i.sbsb.1 (O.sub.1)a.sub.i.sbsb.1.sub.i.sbsb.2 b.sub.i.sbsb.2 (O.sub.2) . . . a.sub.i.sbsb.T-1.sub.i.sbsb.T b.sub.i.sbsb.T (O.sub.T)    (1)

where i₁, i₂, . . . i_(T) is the maximum likelihood sequence of Markovmodel states and O₁, O₂, . . . , O_(T) is the observed sequence.Equation 1 may be written in terms of the forward partial probabilityφ_(t) (i) defined as

    φ.sub.t (i)=P(O.sub.1 O.sub.2 . . . O.sub.t and maximum likelihood sequence ending in state i at time t|K)          (2)

φ_(t+1) (j) can then be evaluated as ##EQU1## for

    1≦j≦N

and

    max{1, j-2}≦i≦j

where ##EQU2## so that Equation 1 becomes

    P(O|K)=P=φ.sub.T (N)                          (4)

After the probability signal for each reference pattern model has beengenerated, the input speech pattern may be identified as the referencepattern model corresponding to the highest probability signal.

FIG. 2 shows a general flow chart that illustrates the operation of ahidden Markov model speech recognizer in accordance with the invention.When the recognizer is available (box 205), the input speech pattern isconverted to a sequence of digital signals representative thereof as perbox 210. The speech representative digital signals (box 210) are thentransformed into a time frame sequence of linear predictive featuresignals (box 215). While the signals generated by the operation of box215 correspond to the acoustic features of the input speech pattern, thesignals therefrom are not constrained to a finite set. Operation box 220is effective to compare the speech pattern acoustic features to a finiteset of linear predictive feature vectors and select the closestcorresponding stored vector for each speech pattern feature signal. Inthis manner, a vector quantized feature signal from a predeterminedfinite set is obtained for every successive frame t of the speechpattern. The vector quantized feature signal is then the observationinput to the hidden Markov model processing in frame t.

A set of predetermined models is stored. A single model is provided foreach reference pattern in the recognizer vocabulary. The referencepattern model includes a state output symbol signal matrix for eachmodel state and a transition probability signal matrix corresponding toall possible transitions between states for the reference pattern model.The reference pattern models are selected sequentially as indicated inbox 225 and the probability that the LPC vector quantized featuresequence of the input speech pattern is obtained from the selectedreference word model is generated and stored (box 230). After the lastreference pattern model has been selected and the probability signaltherefor produced, the maximum of the probability signals is selectedand a signal identifying to the best reference pattern is transferred toa utilization device such as a data processor or a control system.

In accordance with the invention, the hidden Markov model for eachreference pattern has the number of states, e.g. 5, limited to be lessthan the number of feature signal time frames in the reference patternand is constrained so that state 1 is always the first frame initialstate, only a prescribed set of left-to-right state transitions arepossible, and a predetermined final state is defined from whichtransitions to other states cannot occur. These restrictions areillustrated in the state diagram of FIG. 1. With reference to FIG. 1,state 1 is the initial state, state 5 is the final or absorbing state,and the prescribed left-to-right transitions are indicated by thedirectional lines among the states.

According to the state diagram of FIG. 1, it is only possible to reenterstate 1 via path 111, to proceed to state 2 via path 112, or to proceedto state 3 via path 113 from state 1. In general, transitions arerestricted to reentry of a state or entry to one of the next two states.We have found that these restrictions permit rapid and accuraterecognition of speech patterns. The generation of the identifiedutterance reference pattern models for the recognizer are not restrictedto the speech patterns of one identified speaker but may be derived fromutterances of many different speakers so that the speech recognition isspeaker independent.

DETAILED DESCRIPTION

FIG. 3 shows a general block diagram of a speech recognizer illustrativeof the invention. The circuit of FIG. 3 is adapted to recognize speechpatterns applied to electroacoustic transducer 300 and to provideprescribed control signals to utilization device 380 responsive to theidentified pattern. In FIG. 3, filter and sampler circuit 310 receivesan electric analog signal from transducer 300 and is operative tolowpass filter the signal so that unwanted higher frequency noise isremoved. The cutoff frequency of the filter may be set to 3.2 kHz. Thefiltered signal is then sampled at a 6.7 kHz rate as is well known inthe art.

The sampled signal is supplied to analog-to-digital converter 320 inwhich each successive sample is transformed into a digitally codedsignal representative of the magnitude of the corresponding sample. Thesequence of coded signals is applied to LPC feature signal generator330. As is well known in the art, generator 330 temporarily stores thedigital coded signal sequence, groups them into successive overlappingframes of 45 ms duration and produces a set of P linear predictionparameter signals for each frame. Each set of these LPC signals isrepresentative of acoustic features of the corresponding frame. It is tobe understood, however, that spectral or other acoustic feature signalsmay be utilized by those skilled in the art.

Feature generator 330 is also operative to detect the endpoint of theinput speech pattern applied to transducer 300 on the basis of an energyanalysis of the feature signal sequence. The endpoint detectionarrangement may be the one disclosed in U.S. Pat. No. 3,909,532 issuedto L. R. Rabiner et al on Sept. 30. 1975. Alternatively, other wellknown endpoint detection techniques may be used. The feature generatormay comprise a microprocessor such as the type MC68000 manufactured byMotorola, Inc. having the permanently stored set of instructions listedin Fortran language in Appendix A hereto in a read only memory (ROM) tocontrol feature signal generation and endpoint detection.

Upon detection of a speech pattern endpoint in feature generator 330,control signal ST is enabled and sent to recognition processor 340 toinitiate its operations. The recognition processor may comprise a typeMC68000 microprocessor described in the publication MC68000 16 BitMicroprocessor User's Manual, second edition, Motorola Inc., 1980. Theoperation sequence of processor 340 is controlled by the permanentlystored instruction set contained in program ROM 355. These instructionsare set forth in the Fortran language listing of Appendix B hereto.

Acoustic feature signal store 370 receives the frame sequence of LPCcoefficient signals representative of the input speech pattern fromgenerator 330 and stores the feature signals in addressable framesequence order for use by recognition processor 340. Prototype signalstore 365 contains signals representative of a set of predetermined LPCprototype feature signals which cover the range of expected LPC featuresignals in the input search pattern. These prototype signals provide afinite set of symbols for Markov model processing.

Markov Model Store 360 contains a set of coded signals corresponding tothe hidden word Markov models of the possible reference patterns for theunknown utterance applied to transducer 300. Each Markov model comprisesa set of signals, a_(ij), corresponding to the probability oftransitions between model states and signals b_(j) (O_(t)) correspondingto the output symbol probability in each state. The output symbolsO_(t), one for each speech pattern frame t, correspond to the prototypesignals in store 365. Each of stores 360 and 365 may comprise a readonly memory addressable by processor 340. ROMs 360 and 365 permanentlystore the model and prototype signals. Store 370 may be a random accessmemory addressable by processor 340. RAM store 350 is utilized as anintermediate memory for the signal processing operations of therecognition processor, and interface 345 provides a communicationinterface between the recognition processor and the devices in FIG. 3.Bus 345 may comprise the type HBFA-SBC614 backplane manufactured byHybricon Corporation. Alternatively, processor 340, bus 345, controlmemory 350 and RAM 355 may be the type 0B68K1A MC68000 /MULTIBUS signalboard computer manufactured by Omnibyte Corporation, West Chicago, Ill.A Q bus arrangement could also be utilized.

The circuit of FIG. 3 may be utilized to recognize many different typesof patterns. For purposes of illustration, an arrangement forrecognizing digits, e.g., of a telephone number or credit card number,is described. Assume an utterance of the digit "nine" is applied totransducer 300. In accordance with boxes 207 and 210 of the flow chartof FIG. 2, the input speech pattern is filtered and sampled in Filterand Sample Circuit 310 and transformed into digital signal form in A/Dconverter 320. The sequence of digital coded signals are supplied to theinput of Feature Signal Generator 330 in which LPC coefficient featuresignals are produced for the successive frames of the speech pattern"nine" as per box 215. The generated LPC feature signals are transferredto Acoustic Feature Signal Store 370 as addressed by frame index t vialine 332. Decision box 218 is entered in each frame to determine whetherthe endpoint of the pattern has been reached. Upon detection of theendpoint, signal ST is generated in the feature signal generator andsent to recognition processor 340.

Responsive to signal ST, processor 340 is placed in its vectorquantization mode during which the LPC feature signals in store 370 arequantized to the prototype signals in ROM 365 as per operation box 220.The quantization mode is shown in greater detail in the flow chart ofFIG. 4, and the permanently stored instruction codes for the vectorquantization mode of control program memory 355 are listed in AppendixB. Referring to FIG. 4, LPC feature signal frame index t in processor340 is initially reset to 0 as per box 401. Loop 403 is then entered toinitialize the setting of the prototype index m. In loop 403, frameindex t is incremented (box 405) and the incremented frame index iscompared to the last frame (T) of the input speech pattern (box 410).Until t>T, box 415 is entered so that the current frame input speechpattern LPC feature signal U_(t) in store 370 is addressed by processor340 and transferred therefrom to RAM 350. The signal representative ofthe minimum distance between the prototype signal and feature signal(D_(min)) is initially set to infinity (box 420) and the prototype indexm is set to 0 in processor 340 (box 425). Box 430 is then entered inwhich the prototype index m is incremented in processor 340. Theincremented index m+1 is then compared to the last index M=64 as per box435.

At this time, the current prototype signal in store 365 is addressed andtransferred to RAM 350 via the recognition processor (box 440). Theprocess of determining the prototype signal R_(m) that most closelycorresponds to the current speech pattern feature signal U_(t) may thenbe started in processor 340. The processor is conditioned to iterativelygenerate the well known Itakura distance metric signal of the form##EQU3## for each prototype signal where a is an LPC vector from U_(t),a is an LPC vector from R_(m) and V is the autocorrelation matrix fromR_(m).

Initially, distance metric signal d(U_(t),R_(m)) and the feature indexsignal p are set to zero as per box 445 and 450. Distance signal formingloop 452 is then entered and for each feature index the distance signalis incremented in accordance with ##EQU4## as per operation box 455.Index signal p is incremented in processor 340 (box 460) and box 455 isre-entered via decision box 465 until p>P where P is the final featureindex signal. The distance signal is converted to logarithmic form (box468) and is then compared to D_(min) in decision box 470. In the eventthat the current prototype distance signal is equal to or greater thanD_(min), box 430 is re-entered without changing D_(min). Otherwise, theprototype index signal m is stored as representative of the speechpattern quantized signal for frame t and the distance signal forprototype m is stored as D_(min) in RAM 350. Box 430 is then re-entered.When m>M in box 435, O_(t) =m is then selected as the closestcorresponding quantized signal and loop 403 is entered at box 405 sothat the next frame quantization can be initiated.

When speech pattern frame index t becomes greater than the final speechpattern frame T as per box 410, a sequence of quantized signal indices,O₁,O₂, . . . , O_(t), . . . O_(T) has been produced for the speechpattern in processor 340 and stored in RAM 350. The speech patterncorresponding to the utterance of "nine" may, for example, have 36frames and one of 64 possible prototype signals is chosen for eachframe. In this way, the speech pattern is converted into a sequence ofquantized signals of a finite set. Every quantized signal index O_(t)corresponds to a set of P linear prediction coefficients that representsthe quantized acoustic feature of a frame of the speech pattern. For anutterance of the digit "nine" by an unidentified speaker, the sequenceof quantized feature signals may be those listed in Table 1.

                  TABLE 1                                                         ______________________________________                                               Frame Quantized                                                               No.   Signal                                                                  t     O.sub.r                                                          ______________________________________                                               1     14                                                                      2     14                                                                      3     13                                                                      4     9                                                                       5     1                                                                       6     25                                                                      7     26                                                                      8     28                                                                      9     28                                                                      10    28                                                                      11    29                                                                      12    29                                                                      13    19                                                                      14    19                                                                      15    34                                                                      16    34                                                                      17    50                                                                      18    51                                                                      19    52                                                                      20    52                                                                      21    52                                                                      22    51                                                                      23    51                                                                      24    40                                                                      25    46                                                                      26    57                                                                      27    57                                                                      28    57                                                                      29    57                                                                      30    57                                                                      31    57                                                                      32    47                                                                      33    17                                                                      34    3                                                                       35    18                                                                      36    42                                                               ______________________________________                                    

After quantization is completed, processor 340 exits the quantizationmode and enters its Markov model evaluation mode of boxes 225, 230 and235 in FIG. 2. The permanently stored instructions for the Markov modelevaluation mode are listed in Fortran language in Appendix C hereto.During the model evaluation mode, the Markov models for the set ofreference patterns, e.g., digits, 0,1,2, . . . , 9, are successivelyselected. Every model comprises an A matrix of the transitionprobability signals and a B matrix of symbol output probability signals.The A matrices for the digits 0, 5 and 9 are shown by way of example, inTables 2, 3 and 4, respectively. Asterisks represent transitions thatare prohibited by the model and are evaluated as zero.

                  TABLE 2                                                         ______________________________________                                        Digit 0                                                                       A Matrix                                                                      State i 1         2      3       4    5                                       ______________________________________                                         j                                                                            1       .821      *      *       *    *                                       2       .143      .801   *       *    *                                       3       .036      .199   .800    *    *                                       4       *         .000   .079    .880 *                                       5       *         *      .122    .120 1.000                                   ______________________________________                                    

                  TABLE 3                                                         ______________________________________                                        Digit 5                                                                       A Matrix                                                                      State i 1         2      3       4    5                                       ______________________________________                                         j                                                                            1       .852      *      *       *    *                                       2       .136      .932   *       *    *                                       3       .013      .067   .800    *    *                                       4       *         .000   .054    .922 *                                       5       *         *      .146    .078 1.000                                   ______________________________________                                    

                  TABLE 4                                                         ______________________________________                                        Digit 9                                                                       A Matrix                                                                      State i 1         2      3       4    5                                       ______________________________________                                         j                                                                            1       .793      *      *       *    *                                       2       .106      .939   *       *    *                                       3       .100      .061   .690    *    *                                       4       *         .000   .142    .930 *                                       5       *         *      .168    .070 1.000                                   ______________________________________                                    

Each of the A matrix tables is a 5×5 matrix representative of theprobabilities of all transitions among the five states of the model ofFIG. 1. As indicated in Tables 2, 3 and 4, only left-to-righttransitions in FIG. 1 which do not have * or zero values are possible asper the constraints of the model. B matrices for the digits 0, 5 and 9are shown in Tables 5, 6 and 7, respectively. Each column entry in Table5 represents the probability of a particular prototype signal in thecorresponding state for utterances of the digit "zero".

                  TABLE 5                                                         ______________________________________                                        State                                                                         m     1          2      3        4    5                                       ______________________________________                                         1    .059       .011   .001     .001 .015                                     2    .025       .001   .015     .001 .004                                     3    .001       .001   .001     .001 .048                                     4    .007       .001   .001     .103 .001                                     5    .002       .001   .001     .001 .007                                     6    .046       .001   .001     .001 .003                                     7    .001       .001   .001     .059 .001                                     8    .001       .001   .001     .018 .001                                     9    .001       .001   .001     .001 .004                                    10    .006       .028   .014     .008 .008                                    11    .001       .001   .001     .001 .101                                    12    .012       .001   .001     .001 .001                                    13    .001       .001   .001     .001 .025                                    14    .007       .001   .001     .001 .007                                    15    .001       .001   .001     .001 .008                                    16    .007       .001   .001     .001 .006                                    17    .031       .159   .001     .001 .010                                    18    .001       .001   .001     .001 .009                                    19    .028       .001   .001     .076 .006                                    20    .001       .001   .001     .001 .021                                    21    .005       .105   .011     .019 .003                                    22    .001       .001   .001     .001 .090                                    23    .078       .019   .001     .001 .001                                    24    .063       .001   .017     .001 .001                                    25    .001       .001   .001     .001 .090                                    26    .054       .001   .001     .001 .002                                    27    .002       .001   .137     .029 .008                                    28    .001       .007   .001     .001 .010                                    29    .011       .035   .001     .001 .001                                    30    .002       .001   .001     .001 .001                                    31    .021       .001   .169     .013 .001                                    32    .001       .001   .001     .001 .030                                    33    .015       .155   .001     .001 .001                                    34    .040       .001   .014     .021 .004                                    35    .001       .001   .001     .001 .021                                    36    .026       .002   .001     .001 .003                                    37    .004       .040   .032     .001 .001                                    38    .110       .011   .060     .003 .002                                    39    .001       .001   .001     .001 .004                                    40    .005       .001   .001     .022 .062                                    41    .001       .001   .001     .001 .033                                    42    .001       .003   .042     .017 .001                                    43    .044       .062   .001     .001 .001                                    44    .001       .001   .001     .001 .044                                    45    .066       .058   .012     .001 .001                                    46    .002       .002   .006     .305 .001                                    47    .001       .001   .001     .001 .034                                    48    .022       .027   .001     .001 .001                                    49    .019       .001   .001     .001 .001                                    50    .016       .005   .001     .001 .047                                    51    .017       .006   .132     .223 .009                                    52    .035       .006   .003     .001 .001                                    53    .015       .010   .022     .004 .004                                    54    .001       .001   .001     .003 .090                                    55    .011       .141   .001     .001 .006                                    56    .001       .001   .001     .001 .045                                    57    .028       .001   .268     .006 .001                                    58    .001       .001   .001     .001 .020                                    59    .001       .001   .001     .001 .006                                    60    .011       .069   .001     .001 .016                                    61    .001       .001   .001     .003 .006                                    62    .004       .001   .001     .028 .005                                    63    .004       .001   .001     .001 .001                                    64    .016       .001   .001     .001 .002                                    ______________________________________                                    

                  TABLE 6                                                         ______________________________________                                        State                                                                         m     1          2      3        4    5                                       ______________________________________                                         1    .005       .003   .002     .001 .020                                     2    .001       .001   .001     .001 .005                                     3    .001       .001   .001     .014 .001                                     4    .001       .001   .001     .001 .001                                     5    .001       .001   .004     .001 .023                                     6    .001       .001   .001     .001 .009                                     7    .001       .001   .001     .001 .001                                     8    .001       .001   .001     .001 .001                                     9    .001       .002   .010     .038 .004                                    10    .001       .001   .001     .001 .004                                    11    .001       .001   .012     .001 .011                                    12    .001       .001   .001     .001 .001                                    13    .001       .004   .001     .038 .001                                    14    .001       .010   .004     .001 .031                                    15    .001       .098   .001     .001 .001                                    16    .004       .001   .075     .001 .004                                    17    .016       .001   .001     .001 .014                                    18    .001       .001   .001     .001 .001                                    19    .001       .001   .002     .077 .022                                    20    .001       .396   .019     .009 .001                                    21    .001       .001   .001     .001 .029                                    22    .001       .001   .001     .001 .001                                    23    .001       .001   .001     .001 .001                                    24    .001       .001   .001     .001 .012                                    25    .001       .102   .001     .060 .001                                    26    .001       .001   .001     .001 .010                                    27    .001       .001   .003     .001 .012                                    28    .001       .001   .001     .001 .001                                    29    .098       .001   .001     .001 .125                                    30    .001       .001   .001     .001 .001                                    31    .001       .001   .005     .001 .048                                    32    .001       .001   .001     .001 .001                                    33    .003       .001   .001     .001 .026                                    34    .001       .001   .001     .001 .026                                    35    .001       .032   .096     .441 .001                                    36    .001       .001   .001     .001 .017                                    37    .001       .001   .001     .001 .007                                    38    .001       .001   .001     .001 .068                                    39    .001       .001   .066     .066 .001                                    40    .003       .001   .360     .128 .013                                    41    .001       .005   .001     .001 .001                                    42    .001       .001   .001     .001 .001                                    43    .591       .001   .001     .001 .136                                    44    .001       .001   .001     .001 .001                                    45    .003       .001   .001     .001 .012                                    46    .001       .001   .001     .001 .004                                    47    .003       .242   .001     .003 .001                                    48    .001       .001   .001     .001 .025                                    49    .001       .001   .001     .001 .008                                    50    .036       .012   .149     .004 .047                                    51    .001       .001   .001     .001 .058                                    52    .009       .001   .001     .001 .005                                    53    .001       .001   .001     .001 .021                                    54    .003       .028   .009     .001 .001                                    55    .064       .001   .001     .001 .029                                    56    .003       .012   .133     .001 .001                                    57    .001       .001   .001     .001 .021                                    58    .001       .001   .001     .001 .001                                    59    .001       .005   .003     .072 .001                                    60    .112       .001   .001     .001 .053                                    61    .001       .001   .001     .001 .001                                    62    .001       .001   .001     .001 .009                                    63    .001       .001   .001     .001 .001                                    64    .001       .001   .001     .001 .004                                    ______________________________________                                    

                  TABLE 7                                                         ______________________________________                                        State                                                                         m     1          2      3        4    5                                       ______________________________________                                         1    .013       .001   .049     .001 .009                                     2    .004       .001   .001     .001 .009                                     3    .001       .009   .001     .016 .001                                     4    .006       .001   .001     .001 .017                                     5    .001       .022   .153     .060 .019                                     6    .001       .001   .026     .001 .011                                     7    .010       .001   .001     .001 .008                                     8    .001       .001   .001     .001 .006                                     9    .001       .051   .050     .010 .003                                    10    .084       .001   .001     .001 .030                                    11    .001       .028   .014     .010 .001                                    12    .001       .001   .001     .001 .003                                    13    .001       .010   .001     .015 .001                                    14    .001       .018   .069     .001 .002                                    15    .001       .015   .001     .103 .001                                    16    .001       .007   .230     .047 .001                                    17    .004       .001   .020     .001 .008                                    18    .005       .015   .004     .001 .001                                    19    .054       .001   .001     .002 .006                                    20    .001       .092   .001     .147 .001                                    21    .035       .001   .064     .001 .024                                    22    .001       .032   .003     .005 .001                                    23    .001       .001   .001     .001 .006                                    24    .018       .001   .001     .001 .020                                    25    .001       .001   .004     .052 .001                                    26    .010       .001   .001     .001 .011                                    27    .001       .011   .006     .001 .004                                    28    .024       .001   .001     .001 .008                                    29    .001       .001   .039     .001 .045                                    30    .004       .001   .001     .001 .002                                    31    .002       .001   .004     .001 .038                                    32    .001       .001   .001     .001 .002                                    33    .006       .001   .001     .001 .030                                    34    .052       .001   .019     .001 .019                                    35    .001       .184   .001     .039 .001                                    36    .108       .001   .001     .001 .085                                    37    .010       .001   .001     .001 .029                                    38    .025       .001   .048     .001 .031                                    39    .001       .236   .011     .025 .001                                    40    .001       .059   .029     .054 .013                                    41    .001       .002   .001     .001 .001                                    42    .008       .001   .001     .001 .017                                    43    .002       .001   .001     .001 .014                                    44    .001       .011   .001     .020 .001                                    45    .004       .001   .001     .001 .016                                    46    .034       .001   .001     .001 .032                                    47    .001       .001   .001     .180 .001                                    48    .001       .001   .001     .001 .041                                    49    .050       .001   .001     .001 .019                                    50    .001       .083   .033     .001 .010                                    51    .201       .001   .001     .001 .135                                    52    .001       .001   .001     .001 .003                                    53    .014       .001   .010     .001 .011                                    54    .030       .001   .001     .018 .005                                    55    .004       .001   .001     .001 .012                                    56    .001       .016   .015     .146 .002                                    57    .040       .001   .001     .001 .101                                    58    .006       .001   .001     .001 .001                                    59    .001       .053   .001     .007 .001                                    60    .001       .002   .062     .001 .006                                    61    .044       .001   .001     .001 .016                                    62    .048       .003   .001     .001 .008                                    63    .001       .001   .001     .001 .001                                    64    .010       .001   .001     .001 .035                                    ______________________________________                                    

There are 64 prototype probabilities in each state column so that thematrix size is 5×64. Tables 6 and 7 corresponding to digits "five" and"nine" are arranged in similar manner.

As indicated in the flow chart of FIG. 2, the Markov models stored inROM 360 are retrieved therefrom in succession as addressed by patternindex k. For each model, a signal representative of the probability thatthe speech pattern quantized feature signal sequence matches the modelis formed. The probability signal forming arrangements are shown ingreater detail in FIGS. 5 and 6. In general, a Markov model is firstselected. For the speech pattern to be recognized, the model isevaluated frame by frame with the quantized signal sequences O₁,O₂, . .. , O_(t), . . . O_(T) as the input. Upon completion of the evaluationfor the last speech pattern frame, a signal corresponding to the maximumprobability that the speech pattern quantized signal sequence wasderived from the model is generated.

The restrictions of the left-to-right, hidden work Markov model used inthe circuit of FIG. 3 requires that the initial state for frame t=1 beonly state 1 in FIG. 1 and that the log probability signal in theinitial state be

    φ.sub.1 (1)=ln(b.sub.1 (O.sub.1))                      (7)

The φ₁ (1) value is derived from the m=14 entry of the state 1 column ofthe B matrix for the digit. The log probability signals φ₁ (i), i=2, 3,4 and 5 for frame t=1 are set to -∞ since these states are not permittedin the model. The ln (φ₂ (j)) signals are then formed for frame t=2 inaccordance with ##EQU5## for max {1,j-2}≦i≦j using the transitionprobability signals in the A matrix for the digit, and the symbolprobability signals in the B matrix corresponding to the second speechpattern frame quantized signal index m of Table 1. For each destinationsize j of speech pattern frame 2, the maximum log probability signal φ₂(j) is stored. The log probability signals for the successive states inthe frame sequence are then generated using the A and B matrix signalsof the digit model and the frame sequence of quantized speech patternsignal indices t. After the processing of the last frame T, the maximumlog probability signal is obtained for the digit model from the finalstate 5 in which transitions to other states are not allowed. State 5 isthe absorbing state. The signal processing for the set of digits isperformed successively and the largest of the maximum log probabilitysignals as well as the corresponding digit identification signal isretained in storage. Upon completion of model processing for digit"nine", the speech pattern is identified as the digit identificationcode for the retained maximum log probability signal.

The Markov model processing of boxes 225, 230, 235 and 240 of FIG. 2 areperformed by processor circuit 340 are shown on the flow chart of FIG.5. Initially, box 501 is entered from box 220 on termination of thequantization mode. The log maximum probability signal is set to itsminimum value -∞ and the selected reference pattern index k* is set to-1. The reference pattern index k is reset to -1 (box 505) andincremented to 0 (box 507). The current reference pattern index k isthen compared to the final index value K as per box 510. Since k=0 atthis time, box 515 is chosen and the A and B matrix signals for the k=0digit, i.e., "zero", are addressed and are transferred from referencepattern Markov model signal store 360 to RAM 350 via processor circuit340 (box 515). The log probability signal for the digit zero, ln P₀ isthen generated as per box 520. As aforementioned, the ln P₀ signalrepresents the probability that the quantized input speech pattern isobtained from the Markov model for digit zero. The flow chart of FIG. 6shows the detailed arrangements of the ln P_(k) signal formation.

In FIG. 6, signal φ₁ (1) is set to ln (b₁ (O₁)) (box 601) correspondingto the m=14 signal of column 1 in the B matrix of Table 5. The sourcestate index i is set to 1 (box 605) and incremented (box 607). Untili>N, final state 5, ln φ₁ (i) for i=2,3, . . . N is set to -∞. The setof φ₁ (1), φ₁ (2), . . . φ₁ (5) signals are stored in RAM 350. These φ₁(i) correspond to the constraint that the Markov model starts in itsfirst state in the first speech pattern frame. FIG. 7 shows atrellis-type diagram illustrating the sequence of states of the Markovmodel for the successive input speech time frames 1, 2, 3 and 4. Column710 corresponds to the first frame in which the speech pattern quantizedindex signal is O₁ =14. Columns 720, 730 and 740 represent the second,third and fourth frames, respectively. The Markov states are listed inascending order in each column. As shown in FIG. 7, only state 1 ispossible in the first time frame.

After the first time frame φ₁ (i) signals are formed, boxes 615 and 620are entered in succession so that the input speech time frame index t isset to 1 and incremented. Since time frame index t is not greater thanthe final time frame T (decision box 625), destination state index i isset to zero as per box 630. Destination index j is incremented to 1 inbox 635 and compared to the final state N=5 (decision box 640). Inaccordance with the constraints of the hidden word Markov model shown inFIG. 1, only transitions to the next two successive states are possible.Consequently, source state index i is set to zero (box 650) andincremented to 1 (box 652) to corresponding to the Markov modelrestrictions. β, the maximum φ₂ (i), is initially set to -∞ (box 650).

The incremented source state index i is compared to the currentdestination state index j=1 as per box 654 and signal forming box 660 isentered for speech pattern time frame t=2, source state index i=1 of theprevious frame and destination state index j=1. Signal α in box 660corresponds to the path from state 1 in column 710 (t=1) to state 1 incolumn 720 (t=2) and its value is obtained by summing previouslygenerated signal φ₁ (1) and ln (a₁₁ b₁ (O₂)). Signal index O₂ is thequantized speech pattern signal for frame t=2 in Table 1; signal a₁₁ isobtained from the A matrix signals of Table 2 in column i=1 and row j=1and b(O₂) is obtained from the m=14 entry of the state 1 column of thezero digit B matrix of Table 5. At this time α=-10.2, β is set to thisvalue as per boxes 665 and 670. Source state index incrementing (box652) is then reentered so that i becomes 2.

Since source state index i is now greater than destination state indexj=1, φ₂ (1) is set to β (boxes 654 and 656) and destination state indexj is incremented to 2 (box 635). Source state index i is reset to 0 andincremented to 1 in boxes 650 and 652. The α signal for t=2, i=1, j=2indices is formed in box 660. In this way, the path from column 710state 1 to column 720 state 2 is traversed in FIG. 7. The t=2, i=1, j=2value of α replaces the β=-∞ signal (boxes 665 and 670).

When signal α is formed for t=2, i=2 and j=2, it is less than β since φ₁(2)=-∞. Consequently, β is not changed in box 670. Source state index iis then incremented (box 652). Incremented index i=3 is now greater thani=2 and φ₂ (2) is set at the β value obtained for t=2, i=1 and j=2 (box656). Similarly, φ₂ (3) is set to the α signal for t=2, i=1 and j=3 asindicated in FIG. 7. The φ₁ (i) signals for i>1 were set to -∞.Consequently, signals φ₂ (j) for j>3 are set to -∞. Tables 8, 9 and 10list the φ₁ (j) log probability signals for the Markov model states ineach time frame t.

                  TABLE 8                                                         ______________________________________                                        State                                                                         Frame   1        2        3      4      5                                     ______________________________________                                        1       -5.0     *        *      *      *                                     2       -10.2    -13.9    -15.3  *      *                                     3       -17.3    -19.0    -20.4  -24.7  -21.0                                 4       -24.4    -26.2    -27.6  -29.9  -26.6                                 5       -27.4    -30.9    -34.7  -37.0  -30.9                                 6       -34.6    -36.3    -37.7  -44.1  -33.3                                 7       -37.7    -43.5    -44.8  -47.2  -39.4                                 8       -44.8    -44.6    -48.0  -54.3  -43.9                                 9       -51.9    -49.7    -53.1  -57.5  -48.5                                 10      -59.1    -54.9    -58.3  -62.6  -53.1                                 11      -63.8    -58.5    -63.5  -67.8  -59.6                                 12      -68.4    -62.1    -67.1  -73.0  -66.1                                 13      -72.2    -69.2    -70.6  -72.2  -71.1                                 14      76.0     -76.4    -77.8  -74.9  -76.2                                 15      -79.4    -83.3    -82.3  -78.9  -81.7                                 16      -82.8    -88.1    -86.8  -82.9  -86.6                                 17      -87.2    -90.1    - 93.1 -90.0  -88.1                                 18      -91.4    -94.3    -92.5  -91.6  -92.8                                 19      -95.0    -98.5    -98.7  -98.7  -99.7                                 20      -98.5    -102.1   -104.3 -105.8 -106.6                                21      -102.1   -105.6   -107.8 -112.9 -113.3                                22      -106.3   -109.2   -107.4 -111.9 -114.6                                23      -110.6   -113.5   -109.7 -111.5 -114.2                                24      -116.1   -119.5   -116.8 -115.4 -114.5                                25      -121.5   -125.0   -124.0 -119.4 -117.3                                26      -125.3   -130.4   -125.6 -124.6 -124.3                                27      -129.1   -134.2   -127.1 -129.9 -131.2                                28      -132.9   -138.0   -128.6 -134.8 -136.1                                29      -136.6   -141.7   -130.2 -136.3 -137.7                                30      -140.4   -145.5   -131.7 -137.9 -139.2                                31      -144.2   -149.3   -133.3 -139.4 -140.7                                32      -151.3   -153.1   -140.4 -142.7 -138.7                                33      -155.0   -155.1   -147.6 -149.8 -143.3                                34      -162.1   -162.3   -154.8 -156.9 -146.4                                35      -169.3   -169.4   -162.0 - 164.0                                                                              -151.1                                36      -176.4   -175.5   -165.4 -168.2 -158.0                                ______________________________________                                    

                  TABLE 9                                                         ______________________________________                                        State                                                                         Frame   1        2        3      4      5                                     ______________________________________                                        1       -7.0     *        *      *      *                                     2       -14.1    -13.5    -16.8  *      *                                     3       -21.2    -19.1    -23.2  -22.9  -25.6                                 4       -28.3    -25.3    -26.4  -26.3  -30.7                                 5       -33.8    -31.3    -32.9  -33.3  -32.2                                 6       -40.9    -33.6    -40.1  -36.2  -39.2                                 7       -47.6    -40.7    -43.3  -43.3  -43.4                                 8       -54.8    -47.7    -50.3  -50.3  -50.3                                 9       -61.9    -54.7    -57.3  -57.3  -57.3                                 10      -69.0    -61.7    -64.4  -64.4  -64.2                                 11      -71.5    -68.3    -71.4  -71.4  -66.3                                 12      -74.0    -74.9    -78.0  -78.5  -68.3                                 13      -81.1    -81.9    -83.9  -81.1  -72.2                                 14      -88.2    -89.0    -90.3  -83.8  -76.0                                 15      -95.3    -96.0    -97.5  -90.8  -79.6                                 16      -102.4   -103.0   -104.7 -97.8  -83.2                                 17      -105.9   -107.5   -106.8 -103.5 -86.3                                 18      -113.0   -114.5   -114.0 -110.5 -89.2                                 19      -117.9   -121.5   -121.2 -117.6 -94.4                                 20      -122.8   -126.9   -128.3 -124.6 -99.7                                 21      -127.8   -131.8   -134.2 -131.7 -105.0                                22      -134.9   -136.7   -139.1 -138.7 -107.8                                23      -142.0   -143.7   -146.2 -145.7 -110.7                                24      -148.0   -150.8   -147.4 -147.9 -115.0                                25      -154.0   -157.0   -148.6 -150.0 -119.4                                26      -160.7   -163.0   -155.8 -157.0 -123.3                                27      -167.5   -169.7   -163.0 -164.1 -127.1                                28      -174.2   -176.4   -170.2 -171.1 -131.0                                29      -180.9   -183.1   -177.3 -178.2 -134.8                                30      -187.6   -189.8   -184.5 -185.2 -138.7                                31      -194.3   -196.6   -191.7 -192.2 -142.5                                32      -200.3   -197.8   -198.9 -198.2 -149.4                                33      -204.6   -204.8   -206.1 -205.2 -153.7                                34      -211.7   -211.8   -213.2 -209.6 -160.6                                35      -218.9   -218.8   -220.4 -216.6 -167.5                                36      -226.0   -225.8   -227.6 -223.7 -174.5                                ______________________________________                                    

                  TABLE 10                                                        ______________________________________                                        State                                                                         Frame   1        2        3      4      5                                     ______________________________________                                        1       -6.9     *        *      *      *                                     2       -14.1    -13.2    -11.9  *      *                                     3       -21.3    -17.8    -19.2  -18.1  -20.6                                 4       -28.4    -20.9    -22.6  -22.8  -26.5                                 5       -33.0    -27.9    -26.0  -29.8  -29.1                                 6       -40.2    -34.7    -31.9  -30.9  -34.6                                 7       -45.0    -41.7    -39.3  -37.9  -38.1                                 8       -49.0    -48.7    -46.6  -44.9  -43.0                                 9       -52.9    -55.7    -53.9  -51.9  -47.8                                 10      -56.9    -62.1    -61.2  -59.0  -52.7                                 11      -64.0    -66.1    -62.4  -66.0  -55.8                                 12      -71.2    -73.1    -66.0  -71.3  -58.9                                 13      -74.4    -80.1    -73.3  -74.0  -63.7                                 14      -77.5    -83.5    -80.7  -80.2  -68.6                                 15      -80.7    -86.7    -83.8  -87.2  -72.5                                 16      -83.9    -89.9    -87.0  -92.7  -76.4                                 17      -91.1    -88.6    -89.6  -95.9  -81.0                                 18      -92.9    -95.6    -96.9  -98.5  -83.1                                 19      -100.1   -102.1   -102.2 -105.5 -88.9                                 20      -107.2   -109.1   -109.3 -111.1 -94.8                                 21      -114.4   -116.1   -116.5 -118.1 -100.7                                22      -116.3   -123.1   -123.7 -125.1 -102.7                                23      -118.1   -125.4   -125.5 -132.1 -104.7                                24      -125.3   -123.2   -123.9 -130.4 -109.1                                25      -132.4   -126.1   -127.8 -128.8 -113.4                                26      -135.9   -133.1   -135.2 -135.8 -115.7                                27      -139.3   -140.1   -142.5 -142.8 -118.0                                28      -142.8   -147.1   -148.6 -149.9 -120.3                                29      -146.2   -152.0   -152.0 -156.9 -122.6                                30      -149.7   -155.4   -155.5 -160.9 -124.9                                31      -153.1   -158.9   -158.9 -164.4 -127.2                                32      -160.3   -162.3   -162.4 -162.6 -134.1                                33      -166.0   -169.3   -166.5 -169.6 -138.9                                34      -173.2   -173.0   -173.8 -172.6 -145.8                                35      -178.8   -177.2   -179.7 -179.6 -152.8                                36      -183.9   -184.2   -186.9 -186.6 -156.9                                ______________________________________                                    

Row 2 of Table 8 lists the values for φ₂ (1), φ₂ (2), φ₂ (3), φ₂ (4) andφ₂ (5) obtained in the Markov model signal processing indicated in FIG.6 for the second speech frame.

The second speech frame processing is completed when destination state jbecomes greater than the final state N=5 in decision box 640. At thattime, speech frame index t is incremented to 3 (box 620) and theprocessing of φ₃ (j) signals is initiated in box 630. As shown in FIG.7, the possible transitions in speech pattern frame t=3 includetransitions from state 1 of frame 2 (column 720) to states 1, 2 and 3 offrame 3 (column 730), from state 2 of frame 2 (column 720) to states 2,3 and 4 of frame 3 (column 730), and from state 3 of frame 2 (column 720to states 3, 4 and 5 of frame 3 (column 730). The processing of φ₃ (j)signals is performed as described with respect to the prior speechpattern time frames in accordance with Equation 8. In frame t=3 andsucceeding frames, however, there may be more than one source state foreach destination state. In FIG. 7, for example, state 2 of column 730may be reached from states 1 and 2 of column 720 and state 3 of column730 may be reached from states 1, 2 or 3 of column 720. For eachdestination state, the maximum α signal generated is retained as the φ₃(j) signal through the operations of boxes 665 and 670. With respect tostate 2 of column 730, ##EQU6## The φ₃ (1), φ₃ (2), φ₃ (3), φ₃ (4) andφ₃ (5) signals obtained in the t=3 frame are listed in the third row ofTable 8 and the φ₄ (j) signals resulting from frame t=4 frame processingare listed in the fourth row of Table 8.

The signal processing shown in FIG. 6 for the successive speech framesis performed in accordance with the constraints of the hidden wordMarkov model to obtain the maximum probability of the input speechpattern "nine" being derived from the model A and B matrix signals forthe digit "zero" for each state in each speech pattern time frame. Afterα is obtained for indices t=36, i=5 and j=5, the processing of the lasttime frame (T=36) is completed through boxes 665, 670, 652, 654 and 656.The φ_(T) (N)=158.0 signal for the final state N=5 is then generated(box 656). This signal represents the maximum log probability that thespeech pattern is derived from the digit zero Markov model and is listedin the last position of the final row (t=36) in Table 8.

When frame t becomes greater than the last speech pattern frame T=36,box 628 is entered from decision box 625 and the maximum probabilitysignal for "zero" is stored. Box 507 of FIG. 5 is then reentered and theMarkov processing for the digit "one" is initiated. Tables 9 and 10illustrate the Markov model processing for the digits five and nine,respectively.

As indicated in boxes 525 and 530, after the max log probability signalfor each digit is formed, it is compared to the largest of the precedingdigit probability values and only the largest value and its identitycode k* are stored. When processing for digit zero is terminated, lnP_(max) is set to -158.0 (Table 8) and k* is set to 0 as per box 530.The ln P_(k) signals for the digit set obtained in the arrangement ofFIG. 3 for the input speech pattern "nine" are those for the finalabsorbing state 5 in frame t=36.

    ______________________________________                                               digit k                                                                             ln (P.sub.k)                                                     ______________________________________                                               0     -158.0                                                                  1     -160.4                                                                  2     -184.9                                                                  3     -158.8                                                                  4     -186.0                                                                  5     -174.5                                                                  6     -175.3                                                                  7     -160.4                                                                  8     -168.9                                                                  9     -156.9                                                           ______________________________________                                    

Consequently, ln P_(max) and k* are unchanged from digit zero until themaximum log probability signal for the digit "nine" model is compared toln P_(max) in decision box 525. As a result of the comparison boxoperation, box 530 is entered. The ln P_(max) signal is set to -156.9and k* is set to 9. At the end of the Markov model evaluation mode, thestored maximum probability signal is -156.9 and the selected digit k*=9.

The just described digit recognition arrangement may be utilized torecognize a series of utterances of letters, digits or words as in atelephone or credit card number. After the selection of the referencemodel with the maximum probability signal P(O|K) as per box 240 in FIG.2, a reference index signal is generated (box 245) and transmitted toutilization device 280 which may be a telephone switching arrangement ora business transaction data processor. Decision box 205 is then enteredso that the next speech pattern of the spoken input may be processed.The arrangement of FIG. 3 may be extended to recognize other speechpatterns such as phrases or sentences by selecting appropriate Markovmodel reference templates. In contrast to prior Markov model speechrecognition arrangements in which models of small speech elements areused, e.g., phonemes our invention utilizes a single model of the entirereference pattern, e.g., word, phrase to identify an utterance as areference pattern. Advantageously, the number of states required forrecognition is reduced, difficulties in concatenating phonemic or otherelemental speech segment models are avoided and speaker-independentoperation is achieved from available data bases. The Markov modeltemplates stored in ROM 360 are generated from utterances of identifiedspeech patterns that may be from any source and from different speakers.Patterns from readily available data banks of recorded utterances may beused to generate Markov models for the speaker for thespeaker-independent recognition arrangement of FIG. 3.

While the invention has been shown and described with reference to aparticular illustrative embodiment, it is to be understood that variousmodifications in form and detail may be made by those skilled in the artwithout departing from the spirit and scope thereof.

    ______________________________________                                        APPENDIX A                                                                    ______________________________________                                        C          ENERGY BASED ENDPOINT DETECTOR                                                SUBROUTINE ENDPTS(E,IS,IE)                                         C                                                                             C          E=ENERGY OF FRAME                                                  C          IS=1 IF WORD HAS STARTED, 0 OTHERWISE                              C          IE=1 TO INDICATE END OF WORD                                       C                                                                                        EMIN=1.E6                                                          C                                                                                        IF (E.GT.EMIN.AND.IS.EQ.O) IS=1                                               IF(IS.EQ.1.AND.E.LT.EMIN) IE=1                                     C                                                                                        RETURN                                                                        END                                                                C          LPCENG--CALCULATE LPC AND ENERGY                                              FOR A GIVEN SPEECH FRAME                                           C                                                                                        SUBROUTINE LPCENG(S,NL,U,IP,N)                                                DIMENSION S(300),U(200,10),R(10),PAR(10),                                     APREV(10)                                                          C                                                                             C          S=SPEECH ARRAY                                                     C          NL=NO OF SAMPLES FOR LPC AND ENERGY                                           ANALYSIS                                                           C          U=MATRIX OF LPC COEFFICIENTS WITH                                             ENERGY STORED IN LAST POSITION                                     C          IP=NO OF COEFFICIENTS (LPC + ENERGY)                                          PER FRAME                                                          C          N=CURRENT FRAME NUMBER                                             C                                                                             C          WINDOW SPEECH SAMPLES BY HAMMING                                              WINDOW                                                             C                                                                                        DO 10 J=1,NL                                                           10     S(J)=S(J)*(0.54-0.46*COS((6.24318*(J-1)/(NL-1))                    C                                                                             C          MEASURE AUTOCORRELATION OF WIN-                                               DOWED FRAME                                                        C                                                                                        DO 20 J=1,IP-1                                                                R(J)=0.                                                                       DO 15 K=1,NL-J+1                                                       15     R(J)=R(J)+S(K)*S(K+J-1)                                                20     CONTINUE                                                           C                                                                             C          SAVE LOG ENERGY                                                    C                                                                                        U(N,IP)=10.*ALOG10(R(1))                                           C                                                                             C          CALCULATE LPC COEFFICIENTS                                         C                                                                                        J=1                                                                           RES=R(J)                                                               30     PAR(J)=0                                                                      J1=J-1                                                                        IF(J1.LT.1)GO TO 50                                                           DO 40 K=1,J1                                                                  IJ=J-K+1                                                               40     PAR(J)=PAR(J)+APREV(K)*R(IJ)                                           50     PAR(J)=(-PAR(J)-R(J+1))/RES                                            55     A(J)=PAR(J)                                                                   J1=J-1                                                                        IF(J1.LT.1) GO TO 70                                                          DO 60 K=1,J1                                                                  IJ=J-K                                                                 60     A(K)=APREV(K)+PAR(J)*APREV(IJ)                                         70     RES=(1.-PAR(J)*PAR(J))*RES                                                    DO 80 L=1J                                                             80     APREV(L)=A(L)                                                                 J=J+1                                                                         IF(J.LE.IP-2) GO TO 30                                             C                                                                             C          CONVERT TO REFERENCE FORMAT                                        C                                                                                        APREV(1)=1.                                                                   DO 90 J=1.IP-2                                                         90     APREV(J+1)=A(J)                                                               DO 100 J=1,IP-1                                                               I1=IP+I-J                                                                     A(J)=APREV(J)                                                                 DO 10 K=2,I1                                                                  K1=K+J-1                                                               110    A(J)=A(J)+APREV(K)*APREV(K1)                                           100    CONTINUE                                                                      A(IP-1)=APREV(IP-1)                                                           DO 120 J=1,IP-1                                                               IF(J.EQ.1)U(J,I)=A(J)                                                         IF(J.NE.1)U(J,I)=2.*A(J)                                               120    CONTINUE                                                           C                                                                                        RETURN                                                                        END                                                                ______________________________________                                    

    ______________________________________                                        APPENDIX B                                                                    ______________________________________                                        C          VECTOR QUANTIZER                                                              DIMENSION R(9), U(9)                                                          INTEGER T, 0(75), P                                                           LOGICAL ST                                                         C                                                                             C          SET UP CONSTANTS                                                   C                                                                                        P=9                                                                           M=64                                                                          N=5                                                                C                                                                             C          WAIT FOR RECOGNIZER TO BE AVAILABLE;                               C          ST IS TRUE WHEN INPUT IS FINISHED                                  C                                                                                 100    IF(.NOT.ST)GO TO 100                                               C                                                                             C          BEGIN MAIN LOOP TO QUANTIZE EACH                                              FRAME                                                              C                                                                                        DO 2 LT=1,T                                                        C                                                                             C          GET A FRAME OF ACOUSTIC FEATURES                                   C                                                                                        IDEV=370                                                                      CALL GETDAT(IDEV,LT,P,U)                                                      DMIN=1.0E75                                                        C                                                                             C          BEGIN SECONDARY LOOP TO FIND BEST                                  C          PROTOTYPE VECTOR                                                   C                                                                                        DO 2 LM=1,M                                                        C                                                                             C          GET A PROTOTYPE VECTOR                                             C                                                                                        IDEV=365                                                                      CALL GETDAT(IDEV,LM,P,R)                                                      DUR=0.0                                                            C                                                                             C          BEGIN INNER LOOP TO COMPUTE DISTANCE                               C                                                                                        DO 1 LP=1,P                                                            1      DUR=DUR+U(LP)*R(LP)                                                           DUR=ALOG(DUR)                                                      C                                                                             C          TEST FOR MINIMUM DISTANCE                                          C                                                                                        IF(DUR.IT.DMIN)O(LT)=LM                                                2      CONTINUE                                                           ______________________________________                                    

    ______________________________________                                        APPENDIX C                                                                    ______________________________________                                        C         MARKOV MODEL EVALUATION                                             C         COMPUTE THE PROBABILITIES OF ALL                                    C         OF THE MODELS                                                       C                                                                                       CALL MODPROB(M,N,T,O)                                               C                                                                             C         MAKE RECOGNIZER AVAILABLE                                           C                                                                                       ST=.FALSE.                                                                    GO TO 100                                                                     END                                                                 C                                                                             C         SUBROUTINE TO COMPUTE MODEL                                                   PROBABILITIES                                                       C                                                                                       SUBROUTINE MODPROB(M,N,T,O)                                                   DIMENSION A(5,5), B(64,5)                                                     INTEGER T,O(T)                                                                PMAX=-1.0E75                                                        C                                                                             C         MAIN LOOP TO COMPUTE MODEL                                                    PROBABILITIES                                                       C                                                                                       DO 1 K=0.9                                                          C                                                                             C         GET MARKOV MODEL                                                    C                                                                                       IDEV=360                                                                      CALL GETDAT(IDEV,K,N*(N+M),A,B)                                     C                                                                             C         COMPUTE LOG PROBABILITY OF MODEL K                                  C                                                                                       CALL VA(M,N,T,A,B,O,PK)                                             C                                                                             C         CHECK FOR LARGEST PROBABILITY                                       C                                                                                       IF(PK.LE.PMAX) GO TO 1                                                        PMAX=PK                                                                       KSTAR=K                                                                 1     CONTINUE                                                            C                                                                             C         SEND SIGNAL TO UTILIZATION DEVICE                                   C                                                                                       CALL USEND(KSTAR)                                                             RETURN                                                                        END                                                                 C                                                                             C         SUBROUTINE TO CALCULATE LOG                                                   PROBABILITY                                                         C         OF A MODEL                                                          C                                                                                       SUBROUTINE VA(M,N,T,A,B,O,PK)                                                 DIMENSION A(N,N), B(M,N), PHI(75.5)                                           INTEGER T, O(T)                                                     C                                                                             C         LOOP TO INITIALIZE PARTIAL LOG                                      C         PROBABILITIES                                                       C                                                                                       PHI(1,I)=ALOG(B(0(1),1))                                                      DO 1 I=1,N                                                              1     PHI(1,I)=-1.0E75                                                    C                                                                             C         MAIN LOOP TO CALCULATE PARTIAL LOG                                  C         PROBABILITIES                                                       C                                                                                       DO 3 LT=2,T                                                         C                                                                             C         INTERMEDIATE LOOP FOR DESTINATION                                             STATES                                                              C                                                                                       DO 3 J=1,N                                                                    BETA=-1.0E75                                                        C                                                                             C         SET UP CONSTRAINT ON TRANSITIONS                                    C                                                                                       IST=MAXO(1,J-2)                                                     C                                                                             C         INNER LOOP TO COMPUTE BEST SOURCE                                             STATE                                                               C                                                                                       DO 2 I=IST,J                                                                  ALPHA=PHI(LT-1,I)+ALOG(A(I,J)*B(O(LT)J))                                      IF(ALPHA.GT.BETA)BETA=ALPHA                                             2     CONTINUE                                                            C                                                                             C         STORE BEST INTERMEDIATE PROBABILITY                                 C                                                                                 3     PHI(LT,J)=BETA                                                      C                                                                             C         STORE MODEL PROBABILITY                                             C                                                                                       PK=PHI(T,N)                                                                   RETURN                                                                        END                                                                 ______________________________________                                    

What is claimed is:
 1. A speech analyzer for recognizing an utterance asone of a plurality of reference patterns each having a frame sequence ofacoustic feature signals comprising:means for storing a set of K signalseach representative of a prescribed acoustic feature of said pluralityof reference patterns; means for storing a plurality of templates eachrepresentative of an identified spoken reference pattern, the templateof each spoken reference pattern comprising signals representative of afirst state, a last state and a preselected number N-2 intermediatestates between said first and last states of a constrained hidden Markovmodel of said spoken reference pattern, N being independent of thenumber of acoustic feature frames in the acoustic feature frame sequenceof the identified spoken reference pattern, a plurality of first typesignals each representative of the likelihood of a prescribed acousticfeature signal of a reference pattern frame being in a predetermined oneof said states, and a plurality of second type signals eachrepresentative of the likelihood of a transition from a prescribedacoustic feature signal in one of said states to another of said statesof said template; means responsive to the utterance for forming a timeframe sequence of acoustic feature signals representative of the speechpattern of the utterance; means responsive to said utterance featuresignal sequence and said stored prescribed acoustic feature signals forselecting a sequence of said prescribed feature signals representativeof the utterance speech pattern; means jointly responsive to saidsequence of prescribed feature signals representative of the utteranceand the reference pattern template N state constrained hidden Markovmodel signals for combining said utterance representative sequence ofprescribed feature signal sequence with said reference pattern N stateMarkov model template signals to form a third type signal representativeof the likelihood of the unknown utterance being the spoken referencepattern; and means responsive to the third type signals for theplurality of reference patterns for generating a signal to identify theutterance as one of the plurality of reference patterns.
 2. A speechanalyzer according to claim 1 wherein said third type signal generatingmeans comprises:means for successively generating speech pattern frameprocessing interval signals for the sequence of prescribed acousticfeature signals; means operative in the current speech pattern frameprocessing interval responsive to the utterance representativeprescribed feature signal of the current speech pattern frame and thereference pattern template N state constrained hidden Markov modelsignals for producing a set of signals representative of the likelihoodof the utterance speech pattern being in a prescribed state of theMarkov model template during said speech frame; and means operative inthe speech pattern frame processing intervals responsive to thelikelihood representative signal corresponding to the reference patternMarkov model template being in the last state during the last speechpattern frame for forming a signal representative of the probability ofthe speech pattern being obtained from the reference pattern N stateMarkov model template.
 3. A speech analyzer according to claim 2wherein:said means for producing said set of likelihood representativesignals in each speech pattern frame processing interval comprises meansresponsive to the first and second type signals for generating a set ofsignals representative of the probability that the reference templateMarkov model for the utterance speech pattern portion up to the currentframe is in each of the reference template Markov model states.
 4. Aspeech analyzer according to claim 1 wherein N is smaller than thenumber of frames in the sequence of acoustic feature signal frames ofthe smallest of said reference patterns.
 5. A speech analyzer accordingto claim 3 wherein:said means for storing the set of K prescribedfeature signals comprises means for storing K linear predictive featureprototype signals covering the range of acoustic features of the framesequence of acoustic feature of the reference patterns; and said meansfor forming a sequence of acoustic feature signals representative of theutterance speech pattern comprises means for forming a sequence oflinear predictive feature signals representative of the utterance speechpattern.
 6. A speech analyzer according to claim 3 wherein the secondtype signals corresponding to transitions from a first distinct statei<N to a second distinct state j<=N, j<i and j≦i+2 are zero valuesignals.
 7. A speech analyzer according to claim 1, 2, 3, 4, 5, or 6wherein said speech pattern is an utterance of a word and each referencepattern is an identified spoken word speech pattern.
 8. A method forrecognizing an utterance as one of a plurality of reference patternseach having a time frame sequence of acoustic feature signals comprisingthe steps of:storing a set of K signals each representative of aprescribed acoustic feature of said plurality of reference patterns;storing a plurality of templates each representative of an identifiedspoken reference pattern, the template of each spoken reference patterncomprising signals representative of a first state, a last state and apreselected number N-2 of intermediate states between said first andlast states of a constrained hidden Markov model of said spokenreference pattern, N being independent of the number of acoustic featureframes in the acoustic feature frame sequences of the identified spokenreference patterns, a plurality of first type signals eachrepresentative of the likelihood of a prescribed acoustic feature of areference pattern frame being in a predetermined one of said states, anda plurality of second type signals each representative of the likelihoodof a transition from a prescribed acoustic feature signal in one of saidstates to another of said states of said templates; forming a time framesequence of acoustic feature signals representative of the speechpattern of the utterance; selecting a sequence of said prescribedfeature signals representative of the utterance speech patternresponsive to the utterance feature signal sequence and the K storedprescribed acoustic feature signals; combining said sequence ofprescribed feature signals representative of the utterance and the Nstate constrained hidden Markov model signals of the reference patterntemplate to form a third type signal representative of the likelihood ofthe unknown utterance being the spoken reference pattern; and generatinga signal to identify the utterance as one of the reference patternsresponsive to the third type signals for the plurality of referencepatterns.
 9. A method for recognizing an utterance as one of a pluralityof reference patterns according to claim 8 wherein generation of saidthird type signals comprises the steps of:successively generating speechpattern frame processing interval signals; in the current speech patternframe processing interval responsive to the prescribed feature signal ofthe current utterance speech pattern frame and the reference patterntemplate N state constrained hidden Markov model signals, producing aset of signals representative of the likelihood of the utterance speechpattern being in a prescribed state of the N state Markov model templateduring said speech pattern frame; and in the speech pattern frameprocessing intervals responsive to the likelihood representative signalcorresponding to the reference pattern Markov model template being inthe last state during the last speech pattern frame, forming a signalrepresentative of the probability of the speech pattern being obtainedfrom the reference pattern N state Markov model template.
 10. A methodfor recognizing an utterance as one of a plurality of reference patternsaccording to claim 9 wherein:the step of producing said set oflikelihood representative signals in each speech pattern frameprocessing interval comprises: generating a set of signalsrepresentative of the probability that the reference template Markovmodel for the utterance speech pattern portion up to the current frameis in each of the reference template Markov model states responsive tothe first and second type signals.
 11. A method for recognizing anutterance as one of a plurality of reference patterns according to claim8 wherein:N is smaller than the number of frames in the sequence ofacoustic feature signal frames of the smallest of said referencepatterns.
 12. A method for recognizing an utterance as one of aplurality of reference patterns according to claim 10 wherein:the stepof storing the set of K prescribed feature signals comprises storing Klinear predictive feature prototype signals covering the range ofacoustic features of the frame sequence of acoustic feature signals ofthe reference patterns; and the step of forming a sequence of acousticfeature signals representative of the utterance speech pattern comprisesforming a sequence of linear predictive feature signals representativeof the utterance speech pattern.
 13. A method for recognizing anutterance as one of a plurality of reference patterns according to claim12 wherein the second type signals corresponding to transitions from afirst distinct state i<N to a second distinct state j<=N, j<i and j>i+2are zero value signals.
 14. A method for recognizing an utterance as oneof a plurality of reference patterns according to claims 8, 9, 10, 11,12, or 13 wherein said speech pattern is an utterance of a word and eachreference pattern is an identified spoken word speech pattern.
 15. Aspeech analyzer for recognizing an utterance as one of a plurality ofvocabulary words comprising:a first memory for storing a set of K vectorquantized prototype signals each representative of a linear predictiveacoustic feature in the frame sequence of acoustic features ofutterances of said plurality of vocabulary words; a second memory forstoring a plurality of vocabulary reference templates, each templatecorresponding to an N state constrained hidden Markov model of avocabulary word and including; a signal corresponding to an initialstate of said constrained hidden Markov model, signals corresponding toN-2 intermediate states of said constrained hidden Markov model, asignal corresponding to the Nth final state of said constrained hiddenMarkov model, the number of states N being preselected to be less thanthe number of acoustic features in the sequence of acoustic features ofthe shortest vocabulary word, a set of first type signals eachrepresentative of the probability of a prototype feature signal being ina predetermined state of said constrained hidden Markov model, and a setof second type signals each representative of the probability oftransition between a predetermined pair of said vocabulary wordconstrained hidden Markov model states; first means responsive to theutterance for forming an M time frame sequence of linear predictiveacoustic feature signals representative of the speech pattern of theutterance; second means operative responsive to said speech patternfeature signals and said stored prototype acoustic feature signals forgenerating a sequence of M prototype acoustic feature signalsrepresentative of said utterance speech pattern; said second means beingjointly responsive to said sequence of M prototype feature signalsrepresentative of the utterance and the signals of the N stateconstrained hidden Markov model of the vocabulary word template forforming a third type signal representative of the likelihood of theunknown utterance being the vocabulary word including means forproducing a sequence of speech pattern frame processing intervalsignals, said second means being operative in the first time processinginterval responsive to the first frame prototype feature signal, thevocabulary word Markov model first state and first type signals forforming a signal representative of the likelihood of the first frameprototype feature signal being in the vocabulary word Markov model firststate, and operative in each of the second to the Mth speech patternframe processing intervals responsive to the Markov model state signals,the current frame prototype feature signals, the first type and secondtype signals, and the likelihood signals of the immediately precedingframe processing interval for forming a set of signals eachrepresentative of the likelihood of the current frame prototype featuresignal being in a prescribed state of the vocabulary word Markov model,and means responsive to the likelihood signal corresponding to the Nthfinal state in the Mth speech pattern frame processing interval forgenerating the third type signal for said vocabulary word representativeof the likelihood of the utterance being the vocabulary word; and meansresponsive to the third type signals for the plurality of vocabularywords for generating a signal identifying the utterance as thevocabulary word having the largest third type signal. .Iadd.
 16. Aspeech analyzer for recognizing an utterance as one of a plurality ofreference patterns each having a frame sequence of acoustic featuresignals comprising:means for storing signals representative of acousticfeatures of said plurality of reference patterns; means for storing aplurality of templates each representative of an identified spokenreference pattern, the template of each spoken reference patterncomprising signals representative of a first state, a last state andpreselected number N-2 intermediate states between said first and laststates of a constrained hidden Markov model of said spoken referencepattern, N being independent of the number of acoustic feature frames inthe acoustic feature frame sequence of the identified spoken referencepattern, a plurality of first type signals each representative of thelikelihood of an acoustic feature signal of a reference pattern framebeing in a predetermined one of said states, and a plurality of secondtype signals each representative of the likelihood of a transition froman acoustic feature signal in one of said states to another of saidstates of said template; means responsive to the utterance for forming atime frame sequence of acoustic feature signals representative of thespeech pattern of the utterance; means responsive to said utterancefeature signal sequence and said stored acoustic feature signals forselecting a sequence of said feature signals representative of theutterance speech pattern; means jointly responsive to said sequence offeature signals representative of the utterance and the referencepattern template N state constrained hidden Markov model signals forcombining said utterance representative sequence of feature signalsequence with said reference pattern N state Markov model templatesignals to form a third type signal representative of the likelihood ofthe unknown utterance being the spoken reference pattern; and meansresponsive to the third type signals for the plurality of referencepatterns for generating a signal to identify the utterance as one of theplurality of reference patterns. .Iaddend. .Iadd.
 17. A speech analyzeraccording to claim 16 wherein said third type signal generating meanscomprises:means for successively generating speech pattern frameprocessing interval signals for the sequence of acoustic featuresignals; means operative in the current speech pattern frame processinginterval responsive to the utterance representative feature signal ofthe current speech pattern frame and the reference pattern template Nstate constrained hidden Markov model signals for producing a set ofsignals representative of the likelihood of the utterance speech patternbeing in a particular state of the Markov model template during saidspeech frame; and means operative in the speech frame processingintervals responsive to the likelihood representative signalcorresponding to the reference pattern Markov model template being inthe last state during the last speech pattern frame for forming a signalrepresentative of the likelihood of the speech pattern being obtainedfrom the reference pattern N state Markov model template. .Iaddend..Iadd.18. A speech analyzer according to claim 17 where: said means forproducing said set of likelihood representative signals in each speechpattern frame processing interval comprises means responsive to thefirst and second type signals for generating a set of signalsrepresentative of the likelihood that the reference template Markovmodel for the utterance speech pattern portion up to the current frameis in each of the reference template Markov model states. .Iaddend.